The Role of Artificial Intelligence in Crisis Communication: A computational analysis of organizational Press Releases

#Project Overview:

This project investigates how U.S. organizations frame AI in press releases concerning disaster response and management. More specifically, I would focus on how AI is framed in organizational press releases related to disaster management, what are key themes and dominant narratives related the discussions of AI in the context of natural disasters and crises in United States and finally what potential risks and ethical concerns regarding AI usage are highlighted in these communications?

#Objective of the Project:

Analyze how U.S. organizations frame AI in press releases during disaster response and management efforts.

RQs

RQ1. How U.S. organizations frame AI in press releases in the context of disaster? RQ2. What are most prevalent frames about AI in organizational press releases related disaster. RQ3. How did these frames have changed about AI change over time? RQ4. What type of sentiments are associated with AI in how they have changed overtime?

Data collection

Data has been collected from the database Nexis Uni, focusing on press releases disseminated through major news wires. The collection period spans from Nov 1, 2019, to Nov 1, 2024, using the keywords “artificial intelligence” OR “ai” OR “generative ai” OR “machine learning” OR “deep learning” AND “disaster” OR “disaster management” OR “disaster communication” OR “disaster response” OR “disaster preparedness”. A total 7470 press released were collected published in last five years wiht following number each year.

No of Years Number of Press Releases 1 2024 1722 2 2023 1750 3 2022 1433 4 2021 1332 5 2019-2020 1333

.The final dataset comprises 7470 press releases sourced mainly from the following news wires:

  1. States News Service
  2. Targeted News Service
  3. PR Newswire
  4. Business Wire
  5. GlobeNewswire

These press releases explicitly mention “Artificial Intelligence” or “AI” in conjunction with disasters.

A Quick Overview of the data

Total Press Releases: 7470 Average Releases Per Month: 110 Top Mentioned Keywords: “Artificial Intelligence,” “AI,” “Disaster”

Dominant Headlines of the Press Releases: Ethical Concerns Public Trust AI Benefits AI Limitations

Regional Focus: United States

Organizations Highlighted: FEMA Department of Homeland Security Major Tech Companies Non-Governmental Organizations (NGOs)

Data Analysis Approach

A computational textual analysis approach is employed to analyze the collected press releases. The analysis is conducted using R programming, leveraging various libraries and packages to facilitate data processing and analysis. The key steps include:

Data Cleaning and Preparation:

  1. Content Analysis:
  2. Sentimental Analysis:
  3. Topic Modeling & Thematic Analyis:
  4. Visualization and Reporting:

Expected Outcomes

  1. Comprehensive Understanding
  2. Theme Identification
  3. Ethical Insights
  4. Strategic Recommendations

##Repository Highlights

  1. GitHub Repository: FramingAI
  2. Data Access: FramingAI Data Repository
  3. Codebook Structure
  4. Data Samples
  5. Descriptive Statistics
  6. Analytical Tools

Appropriate libraries

library(tidyverse)
library(pdftools)
library(tidyverse) # Includes dplyr, ggplot2, purrr, readr, stringr, etc.
library(textdata)
library(tidytext)
library(quanteda)
library(rio)
library(janitor)
library(rio)
library(stringr)
here::here()
#topic modeling
library(tm)
library(topicmodels)
library(lda)
library(ldatuning)
# from tutorial packages
library(DT)
library(knitr) 
library(kableExtra) 
library(reshape2)
library(ggplot2)
library(wordcloud)
library(pals)
library(SnowballC)
library(flextable)

##Defining the directory containing the PDFs

# Defining the directory containing the PDFs
directory <- "../ai_disaster"

# Getting all PDF file paths in the directory
file_paths <- list.files(path = directory, pattern = "\\.PDF$", full.names = TRUE)

Extracting Text from PDFs and Saving as Text Files and finally writing each document to a text file

# Extracting the Text from PDFs: Combining the text from all PDFs.
combined_text <- sapply(file_paths, function(path) {
  pdf_text(path) %>% paste(collapse = "\n")
}) %>% paste(collapse = "\n")

# Spliting and Saving them as Documents: Spliting the combined text by "End of Document" and saving as individual text files.

documents <- strsplit(combined_text, "End of Document")[[1]]
output_dir <- "../ai_disaster/extracted"

# Ensure the directory exists
if (!dir.exists(output_dir)) {
  dir.create(output_dir, recursive = TRUE)
}

# Write each document to a text file
for (i in seq_along(documents)) {
  output_file <- file.path(output_dir, paste0("FramingAI_extracted", i, ".txt"))
  writeLines(documents[[i]], output_file)
}

cat("Files created:", length(documents), "\n")
## Files created: 7471

##Create a data frame with file names and their corresponding content

# List all text files in the output directory
extracted_files <- list.files(output_dir, pattern = "\\.txt$", full.names = TRUE)

# Read the content of each file into a list
documents_text <- lapply(extracted_files, function(file) {
  readLines(file) %>% paste(collapse = "\n")  # Combine lines into a single text block
})

# Create a data frame with file names and their corresponding content
documents_df <- tibble(
  document_id = basename(extracted_files),  # Extract file names (without path)
  document_text = documents_text  # Content of each document
)

# View the first few rows of the data frame
head(documents_df)
## # A tibble: 6 × 2
##   document_id                 document_text
##   <chr>                       <list>       
## 1 FramingAI_extracted1.txt    <chr [1]>    
## 2 FramingAI_extracted10.txt   <chr [1]>    
## 3 FramingAI_extracted100.txt  <chr [1]>    
## 4 FramingAI_extracted1000.txt <chr [1]>    
## 5 FramingAI_extracted1001.txt <chr [1]>    
## 6 FramingAI_extracted1002.txt <chr [1]>

##Extract Metadata and building final data Frame

##building Final Index

# Step 1: Create a File Name List from File Paths
file_names <- basename(file_paths)  # Extract just the file names from the full paths

# Step 2: Create a Mapping DataFrame
file_mapping <- tibble(
  index = seq_along(file_names),  # Use sequential indices for each file
  filename = file_names          # Map file names to indices
)

# Step 3: Join File Mapping with Final Data
final_index <- final_data |> 
  inner_join(file_mapping, by = "index") |>  # Join based on index
  mutate(
    filepath = paste0("../ai_disaster/extracted/", filename)  # Construct full file paths
  )

# Step 4: Display the Head of the DataFrame with File Names and Content
print(head(final_index))
## # A tibble: 6 × 6
##   index title                           date       publication filename filepath
##   <int> <chr>                           <date>     <chr>       <chr>    <chr>   
## 1     1 http://www.businesswire.com     2019-11-02 <NA>        Framing… ../ai_d…
## 2     2 Page 1 of 2                     2019-11-02 <NA>        Framing… ../ai_d…
## 3     3 Webroot Announces Business End… 2019-11-02 <NA>        Framing… ../ai_d…
## 4     4 Empower MSPs to Do Business Th… 2019-11-02 <NA>        Framing… ../ai_d…
## 5     5 Webroot Announces Business End… 2019-11-02 <NA>        Framing… ../ai_d…
## 6     6 SyncroMSP; New Integration Hel… 2019-11-02 <NA>        Framing… ../ai_d…

Merging text of the documents (documents_df) with Final Index

# Rename 'document_id' to 'filename' for consistency
documents_df <- documents_df %>%
  rename(filename = document_id)

# Merge final_index with documents_df on 'filename'
merged_data <- final_index %>%
  left_join(documents_df, by = "filename")

# Check for any missing text data
missing_text <- merged_data %>%
  filter(is.na(document_text))

if (nrow(missing_text) > 0) {
  warning("There are documents with missing text data:")
  print(missing_text)
} else {
  cat("All documents have corresponding text data.\n")
}
## All documents have corresponding text data.
# Save the merged data to a CSV file for future use
write_csv(merged_data, "~/Desktop/Code/FramingAI/ai_disaster/final_merged_data.csv")

# Preview the merged data
head(merged_data)
## # A tibble: 6 × 7
##   index title             date       publication filename filepath document_text
##   <int> <chr>             <date>     <chr>       <chr>    <chr>    <list>       
## 1     1 http://www.busin… 2019-11-02 <NA>        Framing… ../ai_d… <chr [1]>    
## 2     2 Page 1 of 2       2019-11-02 <NA>        Framing… ../ai_d… <chr [1]>    
## 3     3 Webroot Announce… 2019-11-02 <NA>        Framing… ../ai_d… <chr [1]>    
## 4     4 Empower MSPs to … 2019-11-02 <NA>        Framing… ../ai_d… <chr [1]>    
## 5     5 Webroot Announce… 2019-11-02 <NA>        Framing… ../ai_d… <chr [1]>    
## 6     6 SyncroMSP; New I… 2019-11-02 <NA>        Framing… ../ai_d… <chr [1]>

##Cleaning the Titles

##clenaing the all documnet text data data

library(tidytext)
library(dplyr)
library(stringr)

# Tokenize the document text into sentences
final_data_cleaned2 <- final_data_cleaned %>%
  unnest_tokens(sentence, document_text, token = "sentences")

# Define patterns to filter out irrelevant content
irrelevant_patterns <- c(
  "^\\s*$",                 
  "^\\d+$",                 
  "^Page \\d+",           
  "^\\d{1,2} of \\d{1,2}$",
  "^©.*$",                 
  "^[A-Za-z]{1}$",          
  "^[^a-zA-Z]+$",           
  "Disclaimer",             
  "Legal Notice",           
  "Table of Contents",      
  "For Immediate Release",  
  "Addendum",              
  "Appendix",         
  "Lorem Ipsum",           
  "^[-=_]{3,}$",            
  "^[A-Za-z]+\\s?\\d{1,2},\\s?\\d{4}$", 
  "^\\d{1,2}:\\d{2}\\s?(AM|PM)?$",      
  "^End of Document$",      
  "^Table \\d+$",          
  "^Chart \\d+$",           
  "^Figure \\d+$",          
  "^[^\\s]{20,}$"           
)

# Process and filter the data
final_data_cleaned2 <- final_data_cleaned2 %>%
  filter(!str_detect(sentence, paste(irrelevant_patterns, collapse = "|"))) %>%  # Remove irrelevant patterns
  filter(str_count(sentence, "\\w+") >= 3) %>%  # At least 3 words
  filter(str_count(sentence, "[^a-zA-Z0-9]") / nchar(sentence) < 0.5) %>%  # Fewer than 50% non-alphanumeric characters
  rowwise() %>%  # Enable row-wise operations
  mutate(
    unique_chars = length(unique(str_split(sentence, "")[[1]]))
  ) %>%  # Count unique characters per sentence
  filter(unique_chars > 5) %>%  # More than 5 unique characters
  ungroup() %>%  # Remove row-wise grouping
  select(-unique_chars) %>%  # Drop temporary column
  distinct()  # Remove duplicate rows

# View first rows of cleaned data
head(final_data_cleaned2)
## # A tibble: 6 × 7
##   doc_index title              date       publication filename filepath sentence
##       <int> <chr>              <date>     <chr>       <chr>    <chr>    <chr>   
## 1         1 Untitled Document… 2019-11-02 <NA>        Framing… ../ai_d… "user n…
## 2         1 Untitled Document… 2019-11-02 <NA>        Framing… ../ai_d… "imperi…
## 3         1 Untitled Document… 2019-11-02 <NA>        Framing… ../ai_d… "rob's …
## 4         1 Untitled Document… 2019-11-02 <NA>        Framing… ../ai_d… "americ…
## 5         1 Untitled Document… 2019-11-02 <NA>        Framing… ../ai_d… "talkde…
## 6         1 Untitled Document… 2019-11-02 <NA>        Framing… ../ai_d… "portma…

Bigrams

# Load required libraries
library(dplyr)
library(stringr)
library(tidytext)
bigrams <- final_data_cleaned2 %>% 
  select(sentence) %>% 
  mutate(
    sentence = str_squish(sentence),                      # Remove extra spaces
    sentence = tolower(sentence),
    sentence = str_replace_all(sentence, c(
    "copyright" = "",
    "new york times"="",
    "publication"="",
    "www.alt"="",
    "http"=""))) %>% 
  unnest_tokens(bigram, sentence, token = "ngrams", n = 2) %>% 
  separate(bigram, c("word1", "word2"), sep = " ") %>% 
  filter(!word1 %in% stop_words$word) %>%                 # Filter out stop words
  filter(!word2 %in% stop_words$word) %>% 
  count(word1, word2, sort = TRUE) %>% 
  filter(!is.na(word1) & !is.na(word2))   
    
# Define the pattern to remove specific unwanted terms
remove_pattern <- paste(
  "title|pages|publication date|publication subject|publication type|issn|language of publication: english|",
  "document url|copyright|news|service|initially|vol|issue|filed|ms|virginia|alexandria|last updated|database|startofarticle|af|rights|october|reserved|september|research articles|proquest document id|",
  "classification|https|--|people|alt|article|page|based|language|english|length|words|publication|type|morg|york|times|'new york times'|publication   info|illustration|date|caption|[0-9.]|new york times|identifier/keyword|twitter\\.|rauchway|keynes's|_ftn|enwikipediaorg|",
  "wwwnytimescom|wwwoenbat|wwwpresidencyucsbedu|wwwalt|wwwthemoneyillusioncom|aaa|predated|a_woman_to_reckon_with_the_vision_and_legacy_of_fran|ab_se|",
  "jcr:fec|ac|___________________|\\bwww\\b|[_]+",
  sep = ""
)
# Process bigrams
bigrams <- final_data_cleaned2 %>% 
  select(sentence) %>% 
  mutate(
    sentence = str_squish(sentence),                      # Remove extra spaces
    sentence = tolower(sentence),                         # Convert to lowercase
    sentence = str_replace_all(sentence, remove_pattern, ""), # Remove unwanted terms
    sentence = str_replace_all(sentence, "- ", ""),       # Remove trailing hyphens
    sentence = str_replace_all(sentence, "\\b[a-zA-Z]\\b", "") # Remove single characters
  ) %>% 
  unnest_tokens(bigram, sentence, token = "ngrams", n = 2) %>% 
  separate(bigram, c("word1", "word2"), sep = " ") %>% 
  filter(!word1 %in% stop_words$word) %>%                 # Filter out stop words
  filter(!word2 %in% stop_words$word) %>% 
  filter(!word1 %in% remove_pattern) %>% 
  count(word1, word2, sort = TRUE) %>% 
  filter(!is.na(word1) & !is.na(word2))                   # Filter out NAs

bigrams
## # A tibble: 1,184,674 × 3
##    word1      word2            n
##    <chr>      <chr>        <int>
##  1 artificial intelligence  8474
##  2 climate    change        5702
##  3 national   security      4135
##  4 air        force         4020
##  5 assigned   patent        3761
##  6 mhine      learning      3739
##  7 indo       pific         3635
##  8 pr         wire          3092
##  9 disaster   recovery      2968
## 10 homeland   security      2879
## # ℹ 1,184,664 more rows

top 20 bigrams

top_20_bigrams <- bigrams |> 
   top_n(20) |> 
  mutate(bigram = paste(word1, " ", word2)) |> 
  select(bigram, n)
## Selecting by n

Visualization of bigrams

ggplot(top_20_bigrams, aes(x = reorder(bigram, n), y = n, fill=n)) +
  geom_bar(stat = "identity") +
  theme(legend.position = "none") +
  coord_flip() +  
  labs(title = "Top Two-Word phrases in FramingAI articles",
       caption = "n=7470 Press Releases. Graphic by Taufiq Ahmad. 12-08-2024",
       x = "Phrase",
       y = "Count of terms")

Sentiment Analysis

###AFINN Lexicon Sentiment Analysis

Interactive Sentiment Analysis Over Time

# Create the interactive ggplot
p1 <- ggplot(sentiment_over_time, aes(x = date, y = average_sentiment)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(aes(text = paste0(
    "Date: ", format(date, "%Y-%m-%d"), "<br>",
    "Avg Sentiment: ", round(average_sentiment, 2), "<br>",
    "Sentences: ", sentence_count
  )), 
  color = "darkred", size = 2) +
  geom_smooth(method = "loess", se = TRUE, color = "darkgreen", fill = "lightgreen", size = 1) +
  labs(
    title = "Sentiment Analysis Over Time",
    x = "Years",
    y = "Average Sentiment Score"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 30, hjust = .05),
    plot.title = element_text(hjust = 0.5, size = 10, face = "bold")
  )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning in geom_point(aes(text = paste0("Date: ", format(date, "%Y-%m-%d"), :
## Ignoring unknown aesthetics: text
# Convert ggplot to plotly object for interactivity
interactive_plot <- ggplotly(p1, tooltip = "text") %>%
  layout(
    title = list(text = "Sentiment Analysis Over Time", x = 0.5),
    xaxis = list(title = "Date"),
    yaxis = list(title = "Sentiment Score"),
    hovermode = "closest"
  )
## `geom_smooth()` using formula = 'y ~ x'
# Display the interactive plot
interactive_plot

Heatmap of Average Sentiment Over Years and Months

# Create the heatmap
heatmap_plot <- ggplot(sentiment_year_month, aes(x = month, y = factor(year), fill = average_sentiment)) +
  geom_tile(color = "white") +  # White borders between tiles for clarity
  scale_fill_gradient(low = "lightpink", high = "darkred", name = "Avg Sentiment") +  # Custom red color scale
  labs(
    title = "Average Sentiment Over Years and Months",
    x = "Month",
    y = "Year"
  ) +
  theme_minimal() +  # Clean and minimal theme
  theme(
    axis.text.x = element_text(angle = 30, hjust = 0.5, size = 10),  # Rotate x-axis labels for readability and adjust size
    axis.text.y = element_text(size = 10),  # Adjust y-axis text size
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),  # Center and style the title
    aspect.ratio = 0.6,  # Adjust aspect ratio to make plot wider and tiles smaller
    legend.position = "right",  # Position legend on the right
    legend.title = element_text(size = 8),
    legend.text = element_text(size = 6)
  )

# Display the heatmap
print(heatmap_plot)

Sentiment Analysis, NRC lexicon

Trust sentiment over the time

# Perform the join for trust sentiment
trust_sentiment_over_time <- text_tokenized %>%
  semi_join(nrc_trust, by = "word") %>%
  count(date, sort = TRUE)

# Add one to all trust counts per date for smoothing
trust_sentiment_summary <- trust_sentiment_over_time %>%
  group_by(date) %>%
  summarise(total_trust = sum(n) + 1) # Add one as requested

# Visualization
ggplot(trust_sentiment_summary, aes(x = date, y = total_trust)) +
  geom_line(color = "blue") +
  geom_point(color = "darkblue") +
  labs(
    title = "Trust Sentiments Towards AI Over Time",
    x = "Date",
    y = "Trust Sentiment Count"
  ) +
  theme_minimal()

AN overview of different sentiments

nrc_plot <- sentiments_all %>% 
  ggplot(aes(x = reorder(sentiment, n), y = n, fill = sentiment)) + 
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +  
  geom_text(aes(label = n), hjust = 1.2, size = 4, color = "white") +  
  labs(
    title = "Sentiment Analysis of Press Releases on AI and Disaster",
    subtitle = "NRC Sentiment Analysis Breakdown",
    x = "Sentiment",
    y = "Sentiment Score"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 12, hjust = 0.5),  # Center and bold title
    plot.subtitle = element_text(size = 12, hjust = 0.5),  # Center subtitle
    plot.caption = element_text(hjust = 0, size = 10, face = "italic"),
    axis.text.x = element_text(size = 10, color = "grey40"),
    axis.text.y = element_text(size = 12, color = "grey40"),
    panel.grid.major.x = element_line(color = "grey90", linetype = "dotted"),  # Dotted grid lines
    panel.grid.major.y = element_blank(),  # Remove unnecessary grid lines
    legend.position = "none"  # Hide legend since colors are self-explanatory
  ) +
  scale_fill_manual(values = c(
    "positive" = "#1f77b4",   # Blue
    "negative" = "#d62728",   # Red
    "trust" = "#2ca02c",      # Green
    "fear" = "#9467bd",       # Purple
    "anticipation" = "#ff7f0e", # Orange
    "anger" = "#e377c2"       # Pink
  )) +  # Vibrant colors for each sentiment
  coord_flip()  # Flip for better readability

# Print the enhanced plot
print(nrc_plot)

Filter nrc Positive and Negative Sentiments

library(ggplot2)
nrc_positive <- nrc_sentiments %>%
  filter(sentiment == "positive")

FramingAI_positive <- text_tokenized %>%
  inner_join(nrc_positive) %>%
  count(date, sort = TRUE) %>%
  rename(positive_count = n)
## Joining with `by = join_by(word)`
nrc_negative <- nrc_sentiments %>%
  filter(sentiment == "negative")

FramingAI_negative <- text_tokenized %>%
  inner_join(nrc_negative) %>%
  count(date, sort = TRUE) %>%
  rename(negative_count = n)
## Joining with `by = join_by(word)`
# Combine Positive and Negative Counts
sentiments_comparison <- FramingAI_positive %>%
  full_join(FramingAI_negative, by = "date") %>%
  replace_na(list(positive_count = 0, negative_count = 0)) # Replace NA with 0

# Reshape for Visualization
sentiments_long <- sentiments_comparison %>%
  pivot_longer(cols = c(positive_count, negative_count), names_to = "sentiment", values_to = "count")

# Visualization
# Comparative Visualization of Positive and Negative Sentiments Over Time
# Improved Comparative Visualization of Positive and Negative Sentiments Over Time
ggplot(sentiments_long, aes(x = date, y = count, color = sentiment, group = sentiment)) +
  geom_line(size = 0.7) + # Reduced line thickness for a balanced appearance
  geom_point(size = 0.5) + # Moderate-sized points for emphasis
  scale_color_manual(
    values = c("positive_count" = "#1f77b4", "negative_count" = "#d62728"), # Custom colors
    labels = c("Positive Sentiments", "Negative Sentiments")
  ) +
  labs(
    title = "Sentiment Trends Over Time",
    subtitle = "Comparing Positive and Negative Sentiments in Press Releases",
    x = "Date",
    y = "Sentiment Count",
    color = "Sentiment Type"
  ) +  theme_minimal(base_size = 10) + # Compact text size
  theme(
    plot.title = element_text(face = "bold", size = 12, hjust = 0.5), # Centered and proportional title
    plot.subtitle = element_text(size = 10, hjust = 0.5), # Slightly smaller subtitle
    axis.text.x = element_text(angle = 45, hjust = 1, size = 8), # Smaller x-axis text
    axis.text.y = element_text(size = 8), # Smaller y-axis text
    panel.grid.major = element_line(color = "grey85", linetype = "dotted"), # Subtle grid lines
    legend.position = "top",
    legend.title = element_text(size = 9), # Adjusted legend title size
    legend.text = element_text(size = 8) # Adjusted legend text size
  ) +
    ggplot2::annotate ("text", x = max(sentiments_comparison$date) - 10, y = max(sentiments_comparison$positive_count) - 5, 
           label = "Positive Sentiments Lead", color = "#1f77b4", size = 2.5, fontface = "italic") +
ggplot2::annotate ("text", x = max(sentiments_comparison$date) - 10, y = max(sentiments_comparison$negative_count) - 5, 
           label = "Negative Sentiments Spike", color = "#d62728", size = 2.5, fontface = "italic")

## topic modeling

#Loading relevant libraries and packages for topic modelling

Import Data and process into corpus

topic_data <- final_data_cleaned2 %>%
  select(filename, sentence) %>%
  as.data.frame() %>%
  rename(doc_id = filename, text= sentence)

# load stopwords
english_stopwords <- readLines("https://slcladal.github.io/resources/stopwords_en.txt", encoding = "UTF-8")
# create corpus object
corpus <- Corpus(DataframeSource(topic_data))
# Preprocessing chain
processedCorpus <- tm_map(corpus, content_transformer(tolower))
processedCorpus <- tm_map(processedCorpus, removeWords, english_stopwords)
processedCorpus <- tm_map(processedCorpus, removePunctuation, preserve_intra_word_dashes = TRUE)
processedCorpus <- tm_map(processedCorpus, removeNumbers)
processedCorpus <- tm_map(processedCorpus, stemDocument, language = "en")
processedCorpus <- tm_map(processedCorpus, stripWhitespace)

#DTM: rows correspond to the documents in the corpus. Columns correspond to the terms in the documents. Cells correspond to the weights of the terms.Girder

# compute document term matrix with terms >= minimumFrequency
minimumFrequency <- 5
DTM <- DocumentTermMatrix(processedCorpus, control = list(bounds = list(global = c(minimumFrequency, Inf))))
# have a look at the number of documents and terms in the matrix
dim(DTM)
## [1] 611553  34782
# due to vocabulary pruning, we have empty rows in our DTM
# LDA does not like this. So we remove those docs from the
# DTM and the metadata
sel_idx <- slam::row_sums(DTM) > 0
DTM <- DTM[sel_idx, ]
topic_data <- topic_data[sel_idx, ]
#5 term minimum[1] 1387 3019
#5 term minimum[1] 308597 10339
# number of topics
# K <- 20
K <- 6
# set random number generator seed
set.seed(9161)
#Latent Dirichlet Allocation, LDA
topicModel2 <- LDA(DTM, K, method="Gibbs", control=list(iter = 1000, verbose = 25, alpha = 0.2))
## K = 6; V = 34782; M = 606997
## Sampling 1000 iterations!
## Iteration 25 ...
## Iteration 50 ...
## Iteration 75 ...
## Iteration 100 ...
## Iteration 125 ...
## Iteration 150 ...
## Iteration 175 ...
## Iteration 200 ...
## Iteration 225 ...
## Iteration 250 ...
## Iteration 275 ...
## Iteration 300 ...
## Iteration 325 ...
## Iteration 350 ...
## Iteration 375 ...
## Iteration 400 ...
## Iteration 425 ...
## Iteration 450 ...
## Iteration 475 ...
## Iteration 500 ...
## Iteration 525 ...
## Iteration 550 ...
## Iteration 575 ...
## Iteration 600 ...
## Iteration 625 ...
## Iteration 650 ...
## Iteration 675 ...
## Iteration 700 ...
## Iteration 725 ...
## Iteration 750 ...
## Iteration 775 ...
## Iteration 800 ...
## Iteration 825 ...
## Iteration 850 ...
## Iteration 875 ...
## Iteration 900 ...
## Iteration 925 ...
## Iteration 950 ...
## Iteration 975 ...
## Iteration 1000 ...
## Gibbs sampling completed!
tmResult <- posterior(topicModel2)
theta <- tmResult$topics
beta <- tmResult$terms
topicNames <- apply(terms(topicModel2, 10), 2, paste, collapse = " ")  # reset topicnames

Extracting and visualizing the topics

# Step 1: Check dimensions
n_theta <- nrow(theta)
n_topicdata<- length(topic_data)

cat("Number of rows in theta: ", n_theta, "\n")
## Number of rows in theta:  606997
cat("Number of documents in textdata: ", n_topicdata, "\n")
## Number of documents in textdata:  2
# Check if textdata contains all the documents in theta
common_ids <- intersect(rownames(theta), topic_data$doc_id) # Assuming textdata has a 'doc_id' column

# Filter textdata to include only the documents present in theta
topicdata_filtered <- topic_data[topic_data$doc_id %in% common_ids, ]

# Check dimensions after filtering
n_topicdata_filtered <- nrow(topicdata_filtered)
cat("Number of documents in filtered textdata: ", n_topicdata_filtered, "\n")
## Number of documents in filtered textdata:  606997
# Align rownames of theta with filtered textdata
theta_aligned <- theta[rownames(theta) %in% topicdata_filtered$doc_id, ]

# Step 2: Combine data
full_data <- data.frame(theta_aligned, decade = topicdata_filtered)

# get mean topic proportions per decade
# topic_proportion_per_decade <- aggregate(theta, by = list(decade = textdata$decade), mean)
# set topic names to aggregated columns
colnames(full_data)[2:(K+1)] <- topicNames
# reshape data frame
vizDataFrame <- melt(full_data)
## Using data market servic technolog manag • cloud solut secur provid, decade.text as id variables
#Examine topic names
#enframe(): Converts a named list into a dataframe.
topics <- enframe(topicNames, name = "number", value = "text") %>%
  unnest(cols = c(text))
 
topics
## # A tibble: 6 × 2
##   number  text                                                                  
##   <chr>   <chr>                                                                 
## 1 Topic 1 news servic page copyright california word target bodi length initi   
## 2 Topic 2 research develop technolog climat system energi disast chang scienc w…
## 3 Topic 3 financi busi result year increas includ million compani oper insur    
## 4 Topic 4 state forc unit secur countri nation support china region intern      
## 5 Topic 5 program fund committe million support health hous state act includ    
## 6 Topic 6 data market servic technolog manag • cloud solut secur provid

final Themes

Theme 1. Climate and Disaster Risk Management through Technological and Developmental Research Theme 2. Data-Driven Market and Cloud-Based Technological Solutions for Management and Security Theme 3. Institutional (Academic/Governmental) Recognition and Innovation in Technological Development Theme 4. Financial Performance, Insurance Coverage, and Operational Growth in the Corporate Sphere Theme 5. International and National Security Forces, Regional Support, and Development Efforts Theme 6. National Funding, Committee-Led Programs, and Government-Supported Initiatives

Explaining the Topics and themse and their relevany with the project

#Topic 1: Words: climat, disast, energi, develop, research, chang, technolog, impt, risk, system

Assessment: This cluster revolves around climate and disaster contexts, focusing on energy, development, research, and technological changes. The presence of words like “climat” and “disast” highlight environmental and crisis scenarios, while “technolog,” “develop,” and “research” suggest ongoing innovation and adaptation. Terms like “risk” and “system” imply structured approaches to managing vulnerabilities.

Theme: Climate and Disaster Risk Management through Technological and Developmental Research

Relevance to Project: As my project explores how AI is framed in disaster-related press releases, understanding how climate-related disasters are linked to energy systems, research, and technological changes is crucial. This topic indicates that AI might be positioned as a tool for risk assessment and strategic development in mitigating climate-driven disasters, further connecting energy and systems thinking to resilience planning.

##Topic 2: Words: data, market, technolog, manag, solut, cloud, secur, busi, provid

Assessment: This cluster centers on data-driven market and technological solutions. Words like “data,” “technolog,” “manag,” “cloud,” and “secur” hint at the infrastructure behind digital and analytic tools. “Market,” “busi,” and “provid” suggest commercial offerings and service provision in a technical domain, likely aiming to improve the management of crises through better data usage and secure, cloud-based solutions.

Theme: Data-Driven Market and Cloud-Based Technological Solutions for Management and Security

Relevance to Project: In disaster scenarios, AI tools often rely on secure, scalable, and cloud-based infrastructures to process large datasets. This topic suggests that organizational press releases may frame AI as part of a commercial, data-centric ecosystem, offering solutions that enhance situational awareness, market stability, and secure operations during crises.

##Topic 3: Words: univers, california, target, load, state, presid, award, bodi, develop, patent

Assessment: This cluster highlights academic and governmental elements: “univers,” “california,” “state,” and “presid” point to institutional settings. “Award,” “patent,” and “develop” indicate innovation and recognition of research achievements. “Target,” “load,” and “bodi” may relate to structural or logistical considerations. Overall, it suggests a network of universities, state bodies, and recognized innovations (patents and awards) focused on development.

Theme: Institutional (Academic/Governmental) Recognition and Innovation in Technological Development

Relevance to Project: AI’s framing may be influenced by academic research and patents recognized by states or universities. This topic indicates how organizational press releases might highlight university-based AI research, state-level acknowledgments, and patented technologies contributing to advanced disaster management strategies.

##Topic 4: Words: year, financi, busi, result, million, includ, increas, compani, oper, insur

Assessment: This cluster is strongly associated with financial and business outcomes. Words like “financi,” “busi,” “result,” “million,” and “increas” point to growth, investments, and company performance. “Insur” and “oper” refer to insurance and operations, suggesting risk management and protective measures. Collectively, it indicates a focus on economic performance, insurance coverage, and operational resilience.

Theme: Financial Performance, Insurance Coverage, and Operational Growth in the Corporate Sphere

Relevance to Project: In the context of disasters, press releases may highlight how AI-driven solutions contribute to financial stability, operational continuity, and insurance mechanisms. Understanding these financial narratives helps reveal how AI is framed as a critical tool for maintaining business resilience and mitigating disaster-related economic impacts.

##Topic 5: Words: state, forc, unit, secur, countri, nation, support, region, china, develop

Assessment: This cluster emphasizes geopolitical and security dimensions. Terms like “state,” “forc,” “unit,” “secur,” and “nation” suggest organized security forces or national defense strategies. “Region,” “china,” and “develop” indicate international context and development efforts. It suggests a global, national-security-oriented perspective, where multiple countries, including China, are involved in supportive or strategic roles.

Theme: International and National Security Forces, Regional Support, and Development Efforts

Relevance to Project: For disaster management, press releases might frame AI as integral to national and international coordination efforts. Understanding how AI supports security units, international collaborations, and development projects helps in seeing how AI’s role is communicated as part of broader geopolitical and disaster response strategies.

##Topic 6: Words: fund, committe, support, heh, million, program, nation, includ, hous, state

Assessment: This cluster centers on funding, committees, and national-level programs. Words like “fund,” “committe,” and “support” highlight financial and organizational backing. “Program,” “nation,” “state,” and “hous” suggest government-backed initiatives, possibly housing or resource allocation. “heh” may be a tokenization artifact, but the rest indicates structured, well-funded, national-level support programs.

Theme: National Funding, Committee-Led Programs, and Government-Supported Initiatives

Relevance to Project: AI might be introduced or expanded within such funded and committee-driven national programs to improve disaster readiness, resource distribution, and infrastructural support. This topic implies that organizational press releases could present AI as part of well-funded initiatives that enhance efficiency and resilience at the national and state levels.

Overall, these interpretations and themes give a sense of how AI in disaster scenarios is discussed across various contexts—academic, financial, national-security, and programmatic—offering insights into the multifaceted framing of AI in organizational press releases across the United States.